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ABSTRACT 

Cleavage of introns from precursor transfer RNAs 
(tRNAs) by tRNA splicing endonuclease (EndA) is 
essential for tRNA maturation in Archaea and 
Eukarya. In the past, archaeal EndAs were classified 
into three types (a 2j «4 and a 2 P2) according to 
subunit composition. Recently, we have identified 
a fourth type of archaeal EndA from an uncultivated 
archaeon Candidates Micrarchaeum acidiphilum, 
referred to as ARMAN-2, which is deeply branched 
within Euryarchaea. The ARMAN-2 EndA forms an £ 2 
homodimer and has broad substrate specificity like 
the a 2 p 2 type EndAs found in Crenarchaea and 
Nanoarchaea. However, the precise architecture of 
ARMAN-2 EndA was unknown. Here, we report the 
crystal structure of the s 2 homodimer of ARMAN-2 
EndA. The structure reveals that the z protomer is 
separated into three novel units (a N , a and p c ) fused 
by two distinct linkers, although the overall 
structure of ARMAN-2 EndA is similar to those of 
the other three types of archaeal EndAs. Structural 
comparison and mutational analyses reveal that an 
ARMAN-2 type-specific loop (ASL) is involved in 
the broad substrate specificity and that K161 in the 
ASL functions as the RNA recognition site. These 
findings suggest that the broad substrate 
specificities of £ 2 and a 2 p 2 EndAs were separately 
acquired through different evolutionary processes. 



INTRODUCTION 

Transfer RNA (tRNA) is an adapter molecule that acts as 
a translator of genetic information from nucleotide 
sequence of messenger RNA to amino acid sequence of 
protein. Because tRNA needs to go through a maturation 
process in order to synthesize proteins correctly and 
smoothly, tRNA maturation is essential for life. tRNA 
splicing, which removes introns and joins exons in precur- 
sor (pre)-tRNA, is an important process in tRNA 
maturation. 

Many interruptions of pre-tRNA with introns have 
been found in all three domains of life. In Eukarya, 
most introns are predominantly located in the canonical 
position between nucleotide positions 37 and 38 in the 
anticodon loop of tRNA, while archaeal introns are 
located not only in the canonical position but also in 
various non-canonical positions including the D-and 
T-loops, the variable region and the aminoacyl stem (1). 
In some cases, single archaeal pre-tRNAs include two or 
three introns, called multiple-introns, in non-canonical 
positions (2,3). The introns in eukaryotic cytoplasmic 
and archaeal pre-tRNA are removed by a tRNA splicing 
endonuclease (EndA) (4-6). The eukaryotic EndA consists 
of four subunits (SEN2, SEN15, SEN34 and SEN54) 
(7,8). In contrast, archaeal EndAs are classified into 
three types by subunit composition, namely, homo- 
tetramer (oc 4 ), homodimer (oc^) and heterotetramer (a 2 (3 2 ) 
[(9) and Figure 1]. Figure 1 shows the structures and char- 
acteristics of the three types of archaeal EndAs. The oc 
subunit in the archaeal EndAs is a catalytic subunit and 
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Figure 1. Structures and characteristics of three types of archaeal End As. The subunit interactions are represented by cartoon models on the left 
side. The (3-(3 interaction responsible for inter/intraunit formation, the L10 loop and pocket responsible for dimer/tetramer formation are highlighted. 
The catalytic triads are marked by green circles. The right panels show the ribbon models of End As. The sources of End As are shown in parentheses 
and the substrate specificities of EndAs are shown next to the parentheses: (A) a 4 type MJ-EndA; (B) a' 2 type AFU-EndA; (C) a 2 ?>2 type APE-EndA. 
Full names of the archaea species are as follows: MJ, Methanocaldococcus jannaschii; AFU, Archaeoglobus fulgidus; APE, Aeropyrum pernix. The 
secondary structure diagrams of substrate RNA motifs. The splicing sites are indicated using arrows. B and H represent bulge and helix, respectively. 
The h and h 1 indicate the helices close to the 3-nt bulge on the exonic side and intronic side, respectively. (D) Left, strict BHB motif; Right, relaxed 
BHB motifs (hBH and HBh'). 
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shares homology with SEN2 and SEN34 subunits of 
eukaryotic EndA, implying a common evolutionary 
origin between the eukaryotic and archaeal EndAs 
(7,10). Both the eukaryotic and archaeal EndAs are 
proposed to use a similar cleavage chemistry to that of 
ribonuclease A (6,11). The archaeal a subunit has a cata- 
lytic triad, L-10 loop and pocket (Figure 1). In the case of 
a! 2 EndAs, the oc subunit contains two oc units joined by a 
polypeptide linker (Figure IB). In Figure 1, the locations 
of the catalytic triad in archaeal EndAs are marked by 
green circles. The negatively-charged L-10 loop and posi- 
tively-charged pocket contribute to subunit interaction 
and are conserved in the three types of archaeal EndAs. 
However, the substrate recognition mechanisms of EndAs 
are different to some extent. The eukaryotic EndA 
requires the mature domain of pre-tRNA for the recogni- 
tion of cleavage sites in the canonical position (12), 
although the three types of archaeal EndAs can remove 
introns with a bulge-helix-bulge (BHB) motif irrespective 
of the existence of the pre-tRNA mature domain. 
Furthermore, the oc 2 p 2 type of archaeal EndA possesses 
a broad substrate specificity that recognizes relaxed BHB 
motifs of various lengths and disruption of either the 5'- or 
3'-bulge in the BHB (so-called HBh' and hBH) as well as 
the strict BHB motif (Figure 1C and D). In contrast, the 
oi f 2 and oc 4 type EndAs recognize only the strict BHB motif 
(13-19). The HBh / and hBH motifs are often found at 
non-canonical positions in introns of pre-tRNAs from 
Crenarchaea and Nanoarchaea, consistent with the pos- 
session of the oc 2 p 2 type EndA (3). Some of these pre- 
tRNAs are spliced into two or three gene fragments at 
different loci and are called split or tri-split tRNAs, 
respectively (20,21). Furthermore, permutated tRNA, in 
which the 5' and 3' halves of the coding sequences 
separated by intervening elements have their positions 
switched, has been discovered in some genera of 
Crenarchaea (22). Only the OC2P2 type of archaeal EndA 
with broad substrate specificity has the ability to excise 
non-canonical introns, suggesting the coevolution of dis- 
rupted tRNA gene diversity and EndA architecture 
(16,21). In Crenarchaeal EndAs, the Crenarchaea- 
specific loop (CSL) is conserved in the catalytic oc 
subunit (19). In our previous study, it was revealed that 
the CSL is responsible for the broad substrate specificity 
and that a conserved Lys residue in the CSL functions as 
the substrate recognition site (23). 

Recently, we found a fourth type of archaeal 
EndA from an uncultivated archaeon Candidatus 
Micrarchaeum acidiphilum, referred to as ARMAN 
(Archaeal Richmond Mine Acidphilic Nanoorganism)-2 
(24), which was discovered in an acid mine drainage site 
at Iron Mountain in Northern California (25). Our bio- 
chemical and bioinformatic analyses have led us to 
propose that the ARMAN-2 EndA has a novel three-unit 
architecture that consists of two duplicated catalytic a 
units and one structural P-unit encoded on a single gene 
(24). Our cross-linking analysis showed that two 
three-unit protomers are assembled into the functional 
s 2 , where s represents the union of three units (oc p -oc-|3) 
(24). The amino acid sequences of the oc- and P-units (127 
and 97 amino acids, respectively) are similar to those of 



the catalytic oc and structural P subunits in the other three 
types (0^2, oc 4 and OC2P2) of archaeal EndAs. In contrast, the 
oc p unit (163 amino acids) is a pseudo-catalytic unit since 
three residues (His, Tyr and Lys) comprising the catalytic 
triad and positively-charged residues responsible for dimer 
formation are mutated. The question therefore arises as to 
how the oc p unit interacts with the oc and P units in the 8 2 
architecture? The precise architecture of three-unit inter- 
actions will provide new insights into the molecular evo- 
lution of archaeal EndA. Furthermore, remarkably, the 
ARMAN-2 EndA possesses a broad substrate specificity 
that cleaves introns with both strict and relaxed BHB 
motifs despite lacking the CSL region. What structural 
properties of ARMAN-2 EndA confer the broad substrate 
specificity? Structural determination of ARMAN-2 
EndA is necessary to address these issues. We present 
herein an X-ray crystal structure of ARMAN-2 EndA, 
demonstrating a novel three-unit arrangement of the 8 2 
homodimeric complex. Our structural comparison of 
ARMAN-2 (e 2 ) and the other three types (a 7 2 , oc 4 and 
OC2P2) of archaeal EndAs shows that the ARMAN-2 
EndA possesses an ARMAN-2 type-specific loop (ASL). 
Our structure-guided mutagenesis study identified the 
catalytic residues and revealed that the ASL is responsible 
for the broad substrate specificity. Furthermore, our study 
suggests that the Lys residue in the ASL plays the same 
role as the Lys residue in the CSL for the broad substrate 
recognition and that the ASL has been acquired by a dis- 
tinctly independent evolutionary pathway to the CSL. 

MATERIALS AND METHODS 

Protein expression and purification 

A pET-23 b vector (Novagen) harboring an ARMAN-2 
EndA gene attached to a 6x His tag at its C-terminus 
has been previously constructed (24). The plasmid 
was used for overexpressing the recombinant ARMAN-2 
EndA in Escherichia coli Rosetta 2(DE3) strain 
(Novagen). Escherichia coli cells harboring the plasmid 
were grown in LB media supplemented with 100|ig/ml 
of ampicillin at 37°C, and then isopropylthio-(3- 
galactoside (IPTG) was added to a final concentration of 
0.5 mM when the cells density reached OD 60 o = ~0.8. 
After cultivation at 20°C for 24 h, the cells were harvested 
by centrifugation (6000 rpm at 4°C for 20min). The cells 
were suspended in 15 ml buffer A [20 mM Tris-HCl (pH 
7.6), 200 mM KC1, 20 mM imidazole 10 mM 2-mer- 
captoethanol and 5% glycerol] supplemented with 
protease inhibitor cocktail (Roche) and then disrupted 
with an ultrasonic disruptor (model VCX-500, Sonics & 
Materials., Inc., USA). A fraction of E. coli proteins was 
denatured by heat treatment at 50°C for 20min and 
removed by centrifugation (18 000 rpm at 4°C for 
20min). The supernatant was loaded onto a Ni-NTA 
Superflow column (Qiagen) equilibrated with buffer 
A and then the enzyme was eluted by buffer A containing 
500 mM imidazole. The eluted fractions were collected 
and then loaded onto a HiTrap Heparin-Sepharose 
column (GE Healthcare) equilibrated with buffer B 
[20 mM Tris-HCl (pH 7.6), 50 mM KC1, 10 mM 
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2-mercaptoethanol and 5% glycerol]. The bound protein 
was eluted by a linear gradient of buffer B from 50 mM to 
1 M KC1. The eluted fractions were collected and then 
concentrated to ~ 3 ml volume using Amicon Ultra- 15 cen- 
trifugal filter units. Finally, the concentrated protein was 
applied to a HiLoad 16/60 Superdex 75 pg column (GE 
Healthcare) equilibrated with buffer C [20 mM Tris-HCl 
(pH 7.6), 700 mM NaCl, 10 mM 2-mercaptoethanol and 
5% glycerol]. The single peak fractions were collected. 
Mutant genes were generated using the QuickChange 
site-directed mutagenesis kit (Stratagene), and the muta- 
tions were verified by DNA sequencing. Mutant proteins 
were expressed and purified in the same manner as the 
wild-type protein. The recombinant Archaeao globus 
fulgidus (AFU)-EndA and its chimera mutants were 
prepared as reported previously (23). The protein 
purities were confirmed by SDS-PAGE (Supplementary 
Figure SI). 



data set and the refined model as a search coordinate, the 
structure of the ARMAN-2 EndA was determined by mo- 
lecular replacement with the Phaser program (30). The 
model was further manually built with COOT (29) and 
refined with PHENIX (27). The structure of ARMAN-2 
EndA was refined to iCW^free of 21.8%/25.7% at 2.25 A 
resolution (Table 1). The space group of the crystal 
belonged to P3 2 , where two ARMAN-2 EndA molecules 
are present in an asymmetric unit. The final model con- 
tained residues 2-387 (chain A and B) and 152 water mol- 
ecules. The final model of the ARMAN-2 EndA structure 
was further checked using PROCHECK (31), showing the 
quality of the refined model. Ramachandran plots (%) of 
the ARMAN-2 EndA structure are tabulated in Table 1 . 
The structure factor and coordinates have been deposited 
in the Protein Data Bank (PDB code 4FZ2). All structural 
figures were generated by PyMOL (DeLano Scientific, 
Palo Alto, CA). 



Crystallization 

The single-peak fractions from the Superdex-75 gel-filtra- 
tion column were pooled and then concentrated to 
~10mg/ml using Vivaspin 15R centrifugal filter units 
(Sartorius stedim biotech). Initial trials for crystallization 
of the ARMAN-2 EndA were performed by the 
hanging-drop vapor diffusion method using a Crystal 
Screening Kit (Hampton Research). The drop solution 
was equilibrated against 200 |il of reservoir solution at 
22°C. A few crystals were obtained under some of the 
tested conditions which contained PEG 3350 as the pre- 
cipitant. Based on the initial crystallization conditions, we 
then searched for optimum conditions. When the 
ARMAN-2 EndA protein solution was mixed with an 
equal volume of a crystallization solution that contained 
18% PEG3350 and 0.2 M tri-ammonium citrate (pH 7.0), 
crystals grew within 5 days at 22° C producing full-sized 
rectangular-shaped (200 x 100 x 100 |im) crystals. For 
the experimental phase determination by single- 
wavelength anomalous dispersion (SAD) method, the 
crystal was soaked in mother liquor supplemented with 
0.4 mM KPtCl 4 at 22°C for 16 h. Cryo-protection of the 
native and Pt-induced crystals was achieved by stepwise 
transfer to the respective artificial mother liquor contain- 
ing 25% glycerol. The crystals were then flash-frozen in 
liquid nitrogen. 

Data collection and structure determination 

X-ray diffraction data sets from native crystals 
(A = 1.0000) and SAD data sets from Pt-induced crystals 
(k = 1.0717) were collected at 100 K on the BL38B1 
beamline at SPring-8 (Hyogo, Japan). All data sets were 
processed, merged and scaled using the HKL2000 
program (26). Using the deduced Pt-SAD data set, all 19 
Pt positions were identified and refined in the orthorhom- 
bic space group P2{1{1\, and the initial phase was 
calculated by using AutoSol in PHENIX (27), followed 
by automated model building using RESOLVE (28). The 
resulting map and partial model were used for manually 
building the model using COOT (29). The model was 
further refined by using PHENIX (27). Using the native 



Intron-cleavage assay by the splicing endonuclease 

The transcripts of ARMAN-2 pre-tRNA Ile (UAU) and 
pre- tRNA Cys (GCA) were prepared using T7 RNA poly- 
merase as described in our previous report (24). Splicing 
reactions were performed as follows. 1.0 jig EndA was 
mixed with 0.2nmol transcripts in 50|il buffer D [50 mM 



Table 1. Data collection and refinement statistics 

ARMAN-2 EndA ARMAN-2 

EndA Pt-derivative 



Data collection 
Space group 
Cell dimensions 
a, b, c (A) 

a,P,Y(°) 
Resolution (A) 



112.02, 112.02, 81.0 
90, 90, 90 
50 to 2.25 

(2.33-2.25) 
6.5 (49.2) 
39.2 (4.7) 
99.4 (95.4) 
5.9 (5.5) 



j? a 

Emerge 
/ / Ol 

Completeness (%) 
Redundancy 
Refinement 

Resolution (A) 37.4-2.25 
No. reflections 49 748 

b /^free C 21.8/ 25.7 

No. atoms 6302 

Protein 6454 

Water 152 

Avg. ^-factors (A 2 ) 51.5 
R.m.s.d. 

Bond lengths (A) 0.006 

Bond angles (°) 1.0 
Ramachandran 
plot (%) 

Most favored 90.2 
Additional allowed 9.1 
Generously allowed 0.7 

Disallowed 0.0 



P2{l{l x 

75.62, 85.17, 140,19 
90, 90, 90 
50 to 2.05 

(2.07-2.00) 
5.6 (35.1) 

12.8 (10.6) 
99.7 (98.6) 

12.9 (11.7) 



The value in the parentheses is for the highest resolution shell. 
Emerge = < I(h)> - I(h)j\/^j\<I(h)>\, where <I(h)> is the mean 
intensity of symmetry-equivalent reflections. 
h R work = £ (IIF p (obs)-F p (calc)II)/SIF p (obs)I. 

c ^free = i?- factor for a selected subset (10%) of reflections that was not 
included in earlier refinement calculations. 
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Tris-HCl (pH 7.6), 5mM MgCl 2 , 6mM 2-mercap- 
toethanoland 50 mM KC1] and incubated at 50°C. 
Aliquots (10 ul) were removed at 0, 10, 30 and 60min 
and were analyzed by 15% PAGE/7 M urea. The gel 
was stained with 0.05% toluidine blue. 



RESULTS AND DISCUSSION 

Overall structure 

We initially crystallized an ARMAN-2 EndA to obtain 
structural information. Two different space groups were 
found in different ARMAN-2 EndA crystals under the 
same crystallization conditions. One crystal belonged to 
the orthorhombic space group P2{1{1\^ whereas the 
other belonged to the trigonal space group P3 2 . In this 
study, we determined the structure of ARMAN-2 EndA 
from the latter crystal at 2.25 A resolution (Figure 2A). 
Although the structure from the former crystal could be 
solved at 2.00 A resolution, it exhibited many disordered 
regions due to the effects of crystal packing (data not 
shown). The final model of ARMAN-2 EndA contains 
two molecules per asymmetric unit. The two molecules 
are structurally almost identical (7?-factor = 21.8 and 
7fy r ^-factor = 25.7 in Table 1). 

The ARMAN-2 EndA is composed of two s protomers 
producing a homodimeric subunit structure, s 2 
(Figure 2A). The overall shape of the s 2 homodimer struc- 
ture is like a rectangular parallelepiped. The s protomer 
consists of 10 oc helices and 23 P strands (Figure 2B and 
C). Furthermore, the structure can be separated into three 
units, the oc N unit (2-97 residues; orange), the oc unit 
(126-288 residues; pink) and the (3 C unit (301-387 
residues; cyan). The three units are connected by two 
linkers, linker 1 (98-125 residues; black) and linker 2 
(289-300 residues; black): linker 1 connects the oc N and oc 
units, whereas linker 2 connects the oc and (3 C units. The oc N 
unit (orange) is composed of a mixed anti-parallel and 
parallel (3 sheet (pi-(35), four oc helices (ocl-oc4) and one 
P strand (P6). The P6 strand of the oc N unit, the P7 strand 
of linker 1 and five P strands (P18, pi9, P21, P22 and P23) 
of the p c unit participate in forming one mixed anti- 
parallel and parallel P sheet. This P7— (323 interaction 
probably prevents structural fluctuation of linker 1 . The 
P c unit (cyan) consists of one P sheet (P18, pi9, (321, P22 
and (323), one oc helix (oclO) and one P strand (P20). The P 
sheet is structurally sandwiched by two oc helices (oc4 and 
a 10), thereby stabilizing the unit interaction between the 
oc N and p c units. Furthermore, the P20 and (323 strands 
interact with the (316 and (317 strands in the oc unit, re- 
spectively. These two anti-parallel P sheets connect the p c 
and oc units. The oc unit (pink) can be separated into two 
subdomains, the N-terminus and C-terminus. The N- 
terminal subdomain is composed of a mixed anti-parallel 
and parallel P sheet ((38— (312 and three oc helices (oc5— oc7), 
and the C-terminal subdomain is composed of a mixed 
anti-parallel and parallel P sheet ((31 3— pi 5 and (317), two 
oc helices (oc8— oc9) and one P strand ((316). The five oc 
helices (oc5— oc9) are placed around the two P sheets. 
Thus, these intra-unit interactions probably contribute 
to maintenance of the structural integrity of ARMAN-2 



EndA. The configuration of secondary structures in the oc 
and p c unit overlaps with that of the oc unit (Figure 2D). 
Thus, this configuration is commonly observed in the oc 
and P subunits of the OC2P2 EndAs (19,23,32). 
Furthermore, our structure-based sequence alignment 
analysis has shown that the overlap region of the oc N , oc 
and p c units is found in the N-terminal subdomain of the 
oc subunit in the at f 2 type EndAs , the entire domain of the 
oc subunit in the OC2P2 type EndAs and the C-terminal 
subdomain of the P subunit in the OC2P2 type EndAs 
(Supplementary Figure S2), although the p c unit 
includes an amino acid sequence (373-387 in ARMAN-2 
EndA), which is found in the oc subunit instead of the P 
subunit in the case of Crenarchaeal OC2P2 type EndAs 
(Supplementary Figure S2B). Based on these structural 
observations, we have redefined the fourth type of 
archaeal e 2 EndA, where s is three-units (oc N -oc-P ). 

Structural comparison with three types of archaeal EndAs 

Our current structural study clarified the 8 2 subunit struc- 
ture of ARMAN-2 EndA (Figure 3). The architecture of 
the three units and subunit interactions were far beyond 
our previous expectations because two long linkers 
connect the three units in ARMAN-2 EndA. This archi- 
tecture is not observed in the other three types of EndAs 
(Figure 1). Nevertheless, the overall shape and size of the 
s 2 structure of ARMAN-2 EndA is very similar to those of 
the other three types of archaeal EndAs (Figure 1 and 
Supplementary Figure S3). In addition to the three types 
of archaeal EndAs, a structural homology search by the 
Dali sever (33) confirms that the structure of p c and oc 
units in ARMAN-2 EndA is homologous to that of a 
subunit (SEN 15) of human EndA and that of prokaryotic 
DNA restriction enzymes. 

As shown in Figure 1 and 3, two P— P strand inter- 
actions at the domain interface are conserved in all four 
types of archaeal EndAs (24). The interactions are shown 
to be responsible for intra/interunit interactions such as 
the oc-oc subunit assembly in the oc 4 type EndA, the oc-oc 
domain assembly in the oc' 2 type EndA and the oc~P 
subunit assembly in the OC2P2 type EndA (11,19,32). 
However, in the case of e 2 ARMAN-2 EndA, the P~P 
strand (P22-P23) interaction does not directly contribute 
to the interaction between the oc and p c units since the P22 
and (323 strands are parts of the C-terminal p c unit 
(Figures 2C and 3A). Instead, two anti-parallel P strand 
interactions (P20-P16 and |323— (317) connect the oc and p c 
units (Figure 2C). Furthermore, the (36— (31 8 strand inter- 
action appears to contribute to the assembly of oc N and p c 
units together with formation of a sandwich by the P sheet 
of the p c unit and two oc helices (oc4 and a 10). Because the 
three-unit architecture is connected by two linkers, the 
linkers enable it to easily form a complete s protomer as 
compared to the oc-oc subunit assembly in oc 4 type EndA 
and the oc— (3 subunit assembly in OC2P2 type EndAs. 
Therefore, the linkers play an important role in the 
three-unit architecture. In contrast, as previously 
expected from our bioinformatics study (24), the 
three-unit molecule assembles with another through the 
interaction of a negatively-charged L10 loop with a 
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Figure 2. Crystal structure of ARMAN-2 EndA. (A) Ribbon stereo diagram of the overall structure of the functional s 2 homodimeric complex, 
where 8 stands for the union of three units (a N -a-p c ). The a N unit, a unit, (3 C unit and two linker regions are colored orange, pink, cyan and black, 
respectively. The N- and C-terminal ends are labeled as N and C, respectively. (B) Ribbon diagram of the 8 protomer. The secondary structures of 
the a helix and p strand are labeled (in order) as the a and p, respectively. The a N unit, a unit, p c unit and two linker regions are colored as 
described above, respectively. (C) A secondary structure topology diagram of the s protomer. The a helices and p strands are represented by circles 
and triangles, respectively. The a N , a and p c units are colored as in Figure 2 A and B. (D) Superimposition diagram of Ca atoms of the a N (orange) 
and p c (cyan) units onto that of the a unit (pink) of ARMAN-2 EndA. 
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Figure 3. Structural properties of ARMAN-2 EndA (A) The structure of ARMAN-2 EndA is shown in the same way as in Figure 1. Left, cartoon 
representation of the structural model of the 8 2 type ARMAN-2 EndA. Right, ribbon diagram of the 8 2 type ARMAN-2 EndA. The a N unit, a unit, 
p c unit and two linker regions are colored as in Figure 2. (B) Close-up view of electrostatic interaction between positively-charged pocket and 
negatively-charged L10 loop responsible for dimer formation in ARMAN-2 EndA. Two salt bridges (K228-D357 and K234-E359) are highlighted as 
stick models. 



positively-charged pocket of the oc unit in the opposing 
three-unit molecule (Figure 3B). These electrostatic inter- 
actions are observed in other EndA structures (Figure 1), 
suggesting the structural and/or functional importance of 
these interactions. Figure 3B shows the molecular inter- 
action of the L10 loop and positively-charged pocket in 
the ARMAN-2 EndA. Two salt-bridge interactions 
(D357-K228 and E359-R234) are observed. The posi- 
tively and negatively-charged amino acid residues are 
conserved in almost all EndAs as reported previously 
(24). In fact, our previous mutagenesis study has shown 
that the D357A mutant of ARMAN-2 EndA barely 
cleaves the introns from pre- tRNA Ile and pre- tRNA Cys 
(24), suggesting that the salt-bridge interaction 
(D357-K228) is required for the formation of functional 
s 2 homodimer of ARMAN-2 EndA. Although we could 
not observe the dissociation of s 2 homodimer into s 
protomer in the D357A mutant under our cross-linking 
analysis (24), the breakage of the salt-bridge interaction 
(D357-K228) probably induce the conformational change 
of the side chain of K228. In the ARMAN-2 EndA struc- 
ture, the side chain of K228 is located close to that of 
R275, which is the putative RNA recognition residue as 



described in the next section, at the 4.2 A distance between 
the K228 Cy and R275 Nr]2. Thus, the conformation 
of the side chain of R275 can also be changed by the 
electrostatic repulsion between the K228 and R275 in 
the D357A mutant, resulting in the loss of intron-cleavage 
activity. 

The active site 

It has been reported that three catalytic resides (tyrosine, 
histidine and lysine) as well as two substrate recognition 
residues (two argentines) are conserved in the oc subunit of 
the EndA from Euryarchaea (6,11). Our structural study 
and amino acid sequence alignment strongly suggest that 
the Y236, H251 and K282 residues are the catalytic 
residues and that the R275 and W384 are the possible 
substrate recognition residues in the case of the 
ARMAN-2 EndA (Supplementary Figure S2B). These 
five residues are located on the enzyme surface around 
the expected catalytic pocket (Figure 4A) and can be 
arranged at similar locations to that of their counterpart 
residues in the three types of archaeal EndAs 
(Supplementary Figure S3). To clarify whether the 
catalytic triad of ARMAN-2 EndA is indeed formed 
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Figure 4. The active site of ARMAN-2 EndA. (A) Close-up view of the active site. The catalytic triad comprised of three catalytic residues (Y236, 
H251 and K282) and two putative RNA recognition residues (R275 and W384) are shown by stick model (green). (B) Time-dependent cleavage of 
ARMAN-2 pre-tRNA IIe (UGU) by the wild-type ARMAN-2 EndA. Predicted secondary structure of ARMAN-2 pre-tRNA IIe (UGU) labeled with 
two arrows indicating the splicing sites is shown at the left side of the gel. (C) Time-dependent cleavage of ARMAN-2 pre-tRNA Cys (GCA). Predicted 
secondary structure of ARMAN-2 pre-tRNA Cys (GCA) indicating the splicing sites is shown at the left side of the gel. (D) Cleavage activities of the 
wild-type and three mutants (Y236A, H251A and K282A) using ARMAN-2 pre-tRNA IIe (UGU). (E) Cleavage activities of wild-type and three 
mutants (Y236A, H251A and K282A) using ARMAN-2 pre-tRN A Cys (GC A) . Reaction mixtures were separated on 15% polyacrylamide/7 M urea 
gels. The cleaved products are shown using arrows at the right side of the gel. 



by the three predicted residues (Y236, H251 and K282), 
we constructed three alanine mutants (Y236A, H251A and 
K282A) and then performed an intron-cleavage assay with 
the mutants. Prior to the mutant study, we optimized the 
assay conditions by using the wild-type ARMAN-2 EndA 
and two pre-tRNA transcripts (pre-tRNA Ile and 
pre-tRNA Cys ) previously used as substrates (24). The 
pre-tRNA Ile and pre-tRNA Cys contain a strict BHB 
motif at the canonical position and relaxed BHB motif 
at a non-canonical position, respectively (Figure 4B and 
C). The ARMAN-2 EndA completely removed the introns 
from both the pre- tRNA Ile and pre- tRNA Cys within 
60 min. Next, we assayed for the removal of intron by 



the three mutants, Y236A, H251A and K282A. All three 
mutants failed to remove the introns from both the 
pre-tRNA Ile and pre-tRNA Cys (Figure 4D and E), 
suggesting that the Y236, H251 and K282 residues play 
an important role as the catalytic triad of ARMAN-2 
EndA. Of the RNA recognition residues (R275 and 
W384) of ARMAN-2 EndA, the typtophan is only 
conserved in the a 2 (3 2 type End As from Crenarchaea and 
Nanoarchaea (Supplementary Figure S2B). Instead of the 
tryptophan residue, an arginine residue is conserved in the 
a! 2 an d oc 4 EndAs from Euryarchaea. Two arginine resi- 
dues in the u! 2 EndA capture the adenine base in the first 
bulge of the BHB motif by cation-Ti interactions (6,11). 
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A tryptophan residue can act as an alternative for the 
arginine because its indole ring can form a hydrophobic 
interaction with the nucleotide instead of the cation-Ti 
interaction. 

Broad substrate specificity 

Because the catalytic and substrate recognition residues of 
ARMAN-2 EndA are conserved in all types of archaeal 
EndAs as described above, these residues are probably not 
involved in the broad substrate specificity of the 
ARMAN-2 EndA. We searched for the specific regions 
responsible for the specificity of ARMAN-2 EndA based 
on the structure-based sequence alignment. As a result, 
two specific regions were found (Supplementary Figure 
S2B and S2C highlighted in cyan and green). To confirm 
the findings, we performed a structural comparison of the 
specific regions of ARMAN-2 EndA and counterparts, 
namely, the oc 2 p 2 type Aeropyrum pernix (APE)-EndA 
and Nanoarchaeum equitans (NEQ)-EndAs (Figure 5A 
and B). Shown in Figure 5 A is the specific region 
(158-168 residues) of ARMAN-2 EndA which forms a 
loop structure on the enzyme surface close to the catalytic 
triad. The ARMAN-2 type-specific loop (ASL) is pos- 
itioned at a similar location to the CSL in APE-EndA, 
which plays a significant role in the broad substrate spe- 
cificity. The conformation of ASL resembles that of CSL, 
although amino acid similarity and identity are not found 
with the exception of one positively-charged residue (K161 
in ARMAN-2 EndA and K44 in APE-EndA in Figure 5A 
right panel). Notably, the configurations of K161 in ASL 
and K44 in CSL are such that they are positioned in the 
same direction. The K44 residue in APE-EndA has been 
shown to be essential for the enzymatic activity and broad 
substrate specificity (23). Accordingly, we hypothesized 
that the K161 residue in the ASL is probably involved 
in the splicing activity and broad substrate specificity of 
ARMAN-2 EndA. When we superimposed the structure 
of ARMAN-2 EndA onto that of NEQ-EndA 
(Supplementary Figure S3D and S3E), another specific 
loop (240-247 residues) of ARMAN-2 EndA was found 
to be positioned in the same way as the specific loop 
(90-98 residues) of NEQ-EndA (Figure 5B). These loops 
are close to the catalytic triad, and the catalytic His 
residue is located in both of these specific loops. 
Furthermore, it was expected that some positively-charged 
residues (K93, K94, K96 and R97) in the specific loop of 
NEQ-EndA would be important for the broad-specificity 
(32,34). We hypothesized that the K241 and K244 residues 
of ARMAN-2 EndA may correspond to the positively- 
charged residues of NEQ-EndA, although there is no 
sequence similarity or identity between the ARMAN-2 
and NEQ-EndAs in this specific region. 

To examine whether these residues (K161, K241 and 
K244) are implicated in the enzymatic activity and 
broad substrate specificity of ARMAN-2 EndA, we con- 
structed three alanine mutants (K161A, K241A and 
K244A) and subsequently conducted an intron cleavage 
assay of the mutants (Figure 6 A and B). As shown in 
Figure 6A and B, the wild-type ARMAN-2 EndA, 
K241A and K244A mutants could remove introns from 



both the pre-tRNA Ile and pre-tRNA Cys . In contrast, the 
K161A mutant did not cleave the introns. These results 
demonstrate that the K161 residue is essential for 
the cleavage of introns with strict and relaxed BHB 
motifs. To understand the importance of the K161 
residue structurally, we constructed a docking model of 
ARMAN-2 EndA complex with RNA based on the 
reported oc' 2 type AFU-EndA and RNA complex structure 
(Figure 7 A) (6). The RNA in the reported complex 
contains a BHB motif. The docking model shows that 
the K161 residue is situated near the 3' phosphate group 
adjacent to the bulge structure of the RNA, suggesting 
that the K161 residue captures this 3 / -phosphate group 
(or 3 / -phosphate of the third nucleotide in the loop struc- 
ture), fixes the substrate, and thereby is essential for 
cleavage activity. To clarify whether the K161 residue in 
the ASL plays a key role in determining the substrate 
specificity, we initially created an AFU EndA mutant 
protein (AFU-ASL) in which Lysl75 was replaced by 
the ASL sequence (GTYKVSEH) of ARMAN-2 EndA 
(Figure 7B and C). We also made one additional mutant 
(ASL-K178A mutant), in which the K178 residue of the 
AFU-ASL mutant, corresponding to the K161 residue of 
AMRNA-2 EndA, was replaced with alanine. We then 
analyzed the substrate specificity of these two mutants. 
As shown in Figure 7D, the AFU-ASL and ASL-K178A 
cleaved the intron with a strict BHB motif from the anti- 
codon loop in a similar manner as wild-type ARMAN-2 
and AFU-EndAs. The wild-type AFU-EndA and ASL- 
K178A mutant, however, barely cleaved the intron with 
a relaxed BHB motif from the T-loop of the pre-tRNA Cys 
(Figure 7E). In contrast, the AFU-ASL mutant effectively 
cleaved the intron from the pre-tRNA Cys just as well as the 
wild-type ARMAN-2 EndA did although the cleavage 
fragment of S'-half with intron was shown. Thus, these 
results clearly demonstrate that the insertion of the ASL 
conferred ARMAN-2 EndA-like broad substrate specifi- 
city to AFU-EndA, which otherwise has narrow substrate 
specificity. Furthermore, it was demonstrated that the 
K161 residue in the ARMAN-2 EndA plays a key role 
in the broad substrate specificity acting as the RNA rec- 
ognition site, in a similar way to the function of the K44 
residue in the CSL of APE-EndA (23). 

Evolution of the fourth type of archaeal EndA 

With respect to the evolution of three archaeal EndA 
families (oc' 2 , oc 4 and a 2 (3 2 ), it has been proposed that the 
a subunit gene of oc 4 type EndA was first duplicated and 
then one was subfunctionalized to encode the P subunit, 
thereby suggesting that the oc and (3 subunits are evolu- 
tionarily derived from the same origin (9). Because of the 
striking structural and sequential similarities between the 
ARMAN-2 EndA and the three other types of archaeal 
EndAs (Figure 2, Supplementary Figure S2 and S3), the 8 
protomer of the ARMAN-2 EndA is likely to also 
share the common evolutionary origin of the oc and (3 
subunits. 

The uncultured acidophilic archaeon ARMAN-2 and 
its lineages were predominantly found in a chemoauto- 
trophic biofilm and grown in acidic and metal-rich 
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Figure 5. Comparison of ASL, Crenarchaea- specific loop (CSL) and NEQ-specific loop. (A) Left: superimposed structures of ARMAN-2 EndA and 
APE-EndA. Ribbon diagram of the ARMAN-2 EndA and APE-EndA are colored as in Figure 2A and D. Right: close-up view of the structure of 
ASL region (pink) of ARMAN-2 EndA superimposed on the structure of the CSL region (grey) of APE-EndA. The catalytic triad comprised of three 
catalytic residues (Y236, H251 and K282) are shown by stick model (green). The structure-based sequence alignment is shown at the bottom of 
superimposed structures. The conserved K161 in ASL and K44 in CSL are highlighted in red. (B) Left: superimposed structures of ARMAN-2 EndA 
and NEQ-EndA. Ribbon diagram of the ARMAN-2 EndA and NEQ-EndA are colored as in Figure 2A and Supplementary Figure 3D, respectively. 
The a and |3 subunits in NEQ-EndA are depicted by wheat and lime-green colors, respectively. Right: close-up view of the structure of insertion loop 
(pink) of ARMAN-2 EndA superimposed on the structure of the corresponding loop (wheat) of NEQ-EndA. The positively-charged Lys and Arg 
residues are shown as stick models. The structure-based sequence alignment is shown at the bottom of the superimposed structures. The 
positively-charged residues in the insertion loops of ARMAN-2 EndA and NEQ-EndA are highlighted in red. The catalytic triad comprised of 
three catalytic residues (Y236, H251 and K282) are shown as stick models (green). Full names of the archaea species are as follows; APE, Aeropyrum 
pernix; NEQ, Nanoarchaeum equitans. 



solutions (25). In the biofilm, several eubacteria and 
archaea including the order of Thermoplasmatales are 
found. Intriguingly, the ARMAN lineages were shown 
to physically connect to the Thermoplasmatales using a 
3D cryo-electron tomographic reconstruction (35). 
Furthermore, a virus was found to be on the cell wall of 
the ARMAN lineages, indicating an infection of the 
ARMAN lineages with the virus. Therefore, genetic diver- 
sity may occur in the biofilm community via horizontal 
and/or lateral gene transfer. In fact, ARMAN-2 has many 
genes homologous to those of Crenarchaea and eubacteria 
despite the phylogenetic affiliation to the deeply branched 
Eury archaea. It is noteworthy that our previous (24) and 
current studies demonstrate the recombination of the 
EndA gene in ARMAN-2 cells. Our structure-based 
sequence alignment shows that the oc N unit is homologous 
to the N-terminal subdomain of the a subunit from 



Euryarchaeal EndAs, and that the a and (3 C units share 
homology with the a subunit and C-terminal subdomain 
of the (3 subunit from Crenarchaeal EndAs, respectively 
(Supplementary Figure S2). Accordingly, ARMAN-2 
EndA appears to have undergone a genetic recombination 
of the three subunits. As a result, the oc N , oc and (3 C units 
are currently found as the structural and functional 
element of ARMAN-2 EndA. At the end of the (3 C unit 
(Figure 2 and Supplementary Figure S2), the amino acid 
sequence (373-387 residues), which folds into 
the (323 strand, is similar to that of the C-terminal 
subdomain of the oc subunit from Crenarchaeal EndAs. 
At position 384, a tryptophan residue responsible 
for RNA recognition site is only found at the end of the 
oc subunit of the EndAs from Crenarchaea and 
Nanoarchaea. Given these findings, it is likely that the 
C-terminal subdomain of the Crenarchaeal (3 subunit is 
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incorporated into the end of the Crenarchaeal oc 
subunit, resulting in the formation of the oc— |3 C unit of 
ARMAN-2 EndA. Thus, this could be understood as an 
example of so-called 'domain shuffling' occurring 
naturally. 

The two specific loop regions of ARMAN-2 EndA were 
proposed as candidates responsible for the broad substrate 
specificity from our structural comparison (Figure 5). One 
of these, the ASL, has been shown to be involved in the 
broad substrate specificity of ARMAN-2 EndA 
(Figures 6 and 7). The ASL probably has the same 
function as the CSL from Crenarchaeal EndA. However, 
no significant sequence similarity is found except for the 
conserved Lys residue that functions as the substrate rec- 
ognition site. Therefore, this suggests that the ASL was 
acquired by a distinctly independent evolutionary 
pathway to the CSL, so-called 'convergent evolution'. In 
contrast, another specific loop is positioned at a location 
similar to the specific loop of NEQ-EndA (Figure 5B), but 
has not been shown to be involved in the enzymatic 
activity and substrate specificity (Figure 6 A and B). If 
the loop had continuous positively-charged residues like 
the NEQ-EndA, ARMAN-2 EndA may possess the broad 
substrate specificity even if the ASL is missing. In any 
case, there seems to be two different structural strategies 
to obtain the broader specificity as observed in the oc 2 p 2 
and s 2 types of archaeal EndAs gained by convergent evo- 
lution. First, the conserved Lys residue on either the ASL 
or CSL adjacent to the active site functions as the sub- 
strate recognition site. Second, the continuous 
positively-charged residues on the specific loop including 
the catalytic His residue function as the substrate recogni- 
tion site. In the case of the ARMAN-2 EndA, the second 



strategy may have been lost because of an earlier acquisi- 
tion of the ASL. 

A bona fide role for tRNA introns remains unclear 
except for the methylation on the 2 / -0-ribose of G32 
and G34 in tRNA Trp from Haloferax volcanii (36). 
Randou and Soil have argued that the gain of tRNA 
introns provides protection against integration of 
mobile genetic elements, such as conjugative plasmids 
and viruses (37). A total of 56% of tRNA genes are inter- 
rupted with both the strict and relaxed BHB motif introns 
in the ARMAN-2 (23). In contrast, the lineages, 
ARMAN-4 and ARM AN- 5 have only the strict BHB 
motif introns in 15% of tRNA genes, consistent with pos- 
session of the oc 4 type EndA (24). Because the prototype 
of archaeal EndA was proposed to be an oc 4 type (9), tran- 
sition from the oc 4 to the e 2 type might allow an 
increase in the number and diversity of tRNA introns at 
non-canonical positions in ARMAN-2 cells. Furthermore, 
the CRISPR immune system that protects from virus (38) 
is absent from the genomes of all three ARM AN groups. 
Therefore, inclusion of the ASL in the s 2 type of 
ARMAN-2 EndA may expand the disrupted tRNA 
genes for defense against the integration of mobile 
genetic elements as previously proposed in the case of in- 
clusion of the CSL in the Crenarchaeal EndA (23). 
Moreover, given the report demonstrating that the 
gain of tRNA introns occurred relatively recently 
(39,40), incorporation of the ASL region into the 
ARMAN-2 EndA might have been a dominant advantage 
for the survival of ARMAN-2 cells in the biofilm 
community. 

In conclusion, our structural study of the ARMAN-2 
EndA, which is the fourth type of archaeal EndA, has 
shown the precise architecture of the 8 protomer that 
consists of three units (oc N , oc and p c ). The three units 
form the 8 2 homodimer. There is striking structural and 
functional similarity among all four types (a! 2 , oc 2 p 2 
and 82) of archaeal EndAs, suggesting that the four 
types of archaeal EndAs are derived from a common an- 
cestor. However, the two linker loops connecting the 
three-unit and the ASL are distinct in the ARMAN-2 
EndA. The two linkers play an important role to facilitate 
the three-unit formation, and the ASL confers the broad 
substrate specificity on ARMAN-2 EndA. Furthermore, 
our structure based sequence alignment of the ARMAN-2 
EndA exhibits a trace of gene recombination of the oc and 
P subunits from Eury archaeal and Crenarchaeal EndAs. 
These results broaden understanding of the mechanism 
underlying gain of function in protein architecture. In 
the ASL, the K161 residue functions as a RNA recogni- 
tion site and thereby broadens the specificity of 
ARMAN-2 EndA. The ASL has arisen from convergent 
evolution to play a similar role to the CSL of 
Crenarchaeal EndA. Inclusion of the ASL in ARMAN-2 
EndA may have allowed increasing number and diversity 
of tRNA introns for the protection from mobile genetic 
elements. Thus, our findings further enhance the possibil- 
ity of coevolution of the archaeal EndA architecture and 
disrupted tRNA genes. 
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Figure 7. The conserved K161 residue in ASL is responsible for broad substrate specificity. (A) Model of the complex formed between the 
ARMAN-2 EndA and an RNA substrate (stick model, green) that contains a BHB motif (Left). The dotted square shows the active site. 
Close-up view of the active site of the enzyme-RNA complex (Right). Stick model in grey show the bulge structure (B1-B2-B3) in the BHB 
motif. The K161 in ASL is shown as a stick model (pink). (B) Ribbon diagram of the structure of the ASL region of ARMAN-2 EndA (pink) 
superimposed on the structure of the corresponding region of AFU-EndA (blue). The insertion positions (El 74 and G176), where the ASL peptide 
(G158-H165) was inserted to create the AFU-ASL chimera, are indicated in red. (C) Schematic diagram of creation of the AFU-ASL chimera: the 
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activities of wild-type ARMAN-2 EndA, AFU-EndA and its mutants (AFU-ASL and ASL-K178A): (D) ARMAN-2 pre-tRNA IIe (UGU); (E) 
ARMAN-2 pre- tRNA Cys (GCA). The cleaved products are shown using arrows at the right side of the gel. 
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